I love long camping adventures, but I also love camping vicariously through couples like the Watsons, who make adventure their life. The Watsons are particularly cool because they’ve used their talents to make an awesome website with beautiful maps of their journey over the years, as well as data about it.
I was curious about how much this kind of lifestyle costs, so I did some quick plotting. But first, I needed get the data into R.
While it’s possible to grab data directly from the web, I decided to go the copy and paste approach. I copied the table of camping locations and prices from their website and saved it into a text file. Then, I read it in and cleaned it.
library(tidyverse)
library(lubridate)
library(plotly)
data_raw <- readLines("data.txt")
date_seq <- tibble(arrived = ymd("2012-06-14") + 1:2142)
data <- tibble(text = data_raw[-1]) %>%
mutate(lines = ceiling(1:n()/3)) %>%
group_by(lines) %>%
summarize(text = str_c(text, collapse = "\t")) %>%
separate(text,
into = c("arrived", "campground", "city", "type", "price", "extra"),
sep = "\t") %>%
select(-lines, -extra) %>%
mutate(arrived = mdy(arrived),
price = str_extract(price, "[0-9]+") %>% as.numeric()) %>%
separate(city, c("city", "state"), sep = ", ", extra = "merge") %>%
right_join(date_seq) %>%
fill(-arrived)
print(data)
## # A tibble: 2,142 x 6
## arrived campground city state type price
## <date> <chr> <chr> <chr> <chr> <dbl>
## 1 2012-06-15 Chapin Rd Essex Vermo~ Priv~ 0.
## 2 2012-06-16 Long Point State Park Three Mile Bay New Y~ Stat~ 30.
## 3 2012-06-17 Long Point State Park Three Mile Bay New Y~ Stat~ 30.
## 4 2012-06-18 Keuka Lake Keuka Park New Y~ Stat~ 26.
## 5 2012-06-19 Keuka Lake Keuka Park New Y~ Stat~ 26.
## 6 2012-06-20 Keuka Lake Keuka Park New Y~ Stat~ 26.
## 7 2012-06-21 Keuka Lake Keuka Park New Y~ Stat~ 26.
## 8 2012-06-22 Sampson State Park Romulus New Y~ Stat~ 30.
## 9 2012-06-23 Sampson State Park Romulus New Y~ Stat~ 30.
## 10 2012-06-24 Buckaloons Recreation Area Irvine Penns~ Nati~ 23.
## # ... with 2,132 more rows
With the data in R, I could do some plotting! I started with the cumulative cost of the trip - how much the Watsons have spent so far on accomodation alone.
plot_cumcost <- data %>%
mutate(cum_cost = cumsum(price)) %>%
ggplot(aes(x = arrived, y = cum_cost)) +
geom_point() +
xlab("Date") + ylab("Cumulative cost ($)")
ggplotly(plot_cumcost)
Most of us think of housing costs in terms of the cost per month, so let’s redo the graph a little:
data_monthly <- data %>%
mutate(month = month(arrived),
year = year(arrived),
date = ymd(paste(year, month, 1, sep = "-")),
date_lab = paste(year, month, sep = "-")) %>% #for labeling
group_by(date, date_lab) %>%
summarize(month_price = sum(price))
plot_monthly <- data_monthly %>%
ggplot(aes(x = date, y = month_price)) +
geom_line() +
geom_point(aes(text = date_lab)) +
xlab("Date") + ylab("Cost per month ($)")
ggplotly(plot_monthly, tooltip = c("month_price", "text"))
Now we see that early 2017 was an expensive time for the Watsons, but otherwise, they were generally keeping their accomodation costs to less than $1000/month! Not bad!
#remove the first and last month since they might be incomplete
data_monthly_clean <- data_monthly[-c(1, nrow(data_monthly)),]
mean(data_monthly_clean$month_price)
## [1] 608.9855
In fact, the mean cost is about $600.
Now let’s look at the types of accomodation, and how they broke down in terms of price. I started off using a boxplot, but I added on geom_jitter because it gave me a better sense of the distribution. Here, I’ve also plotted each campground just once.
plot_types <- data %>%
distinct(campground, .keep_all = TRUE) %>%
mutate(type = reorder(type, price)) %>%
ggplot(aes(x = type, y = price, text = campground)) +
geom_boxplot(fill = "palegreen") +
geom_jitter(width = .3) +
coord_flip() +
xlab("Price ($)") + ylab("Accomodation type")
ggplotly(plot_types, tooltip = c("text", "price"))
Some of these categories seem to be more detailed than necessary, so let’s group some together. “Boondocking” by definition denotes $0, but maybe we can gain some insight if we group these categories into federal, state, private, and miscellaneous government land.
data_refact <- data %>%
distinct(campground, .keep_all = TRUE) %>%
mutate(type = fct_collapse(type,
Federal = c("Army Corps of Engineers",
"BLM Boondocking",
"BLM Campground",
"National Forest",
"National Forest Boondocking",
"National Park",
"National Park Boondocking",
"Tennessee Valley Authority"),
State = c("Montana Fish & Wildlife",
"State Forest Campground",
"State Park",
"State Park Boondocking"),
Private = c("Parking Lot",
"Private Residence",
"Private RV Park",
"House Rental"),
OtherGov = c("City Park",
"County Park")))
plot_refact <- data_refact %>%
mutate(type = reorder(type, price)) %>%
ggplot(aes(x = type, y = price, text = campground, color = type)) +
geom_jitter(width = .3) +
coord_flip() +
xlab("Price ($)") + ylab("Accomodation type")
ggplotly(plot_refact, tooltip = c("text", "price")) %>%
hide_legend()
Here we can kind of see that federal and state parks are similar in price, but federal land has more boondocking options. The private category now spans the cheapest and most expensive extremes because it includes both parking lots (at Walmarts and casinos, for the most part) and house rentals.
Perhaps what we really need is a comparison across states. A state like California, for example, can be very expensive (at least on the coast), especially compared to a state with a lot of federal land, like Nevada. To keep things simple, I filtered out the wild variation found in the private and other government categories, keeping just State and Federal. I then plotted as before, but this time by state:
plot_states <- data_refact %>%
filter(type %in% c("State", "Federal"),
!str_detect(state, "Ontario|British")) %>% #filter out Canada
mutate(state = reorder(state, price)) %>%
ggplot(aes(x = state, y = price, color = type, text = campground)) +
geom_jitter() +
coord_flip() +
xlab("Price ($)") + ylab("State")
ggplotly(plot_states)
California’s looking pretty crazy! It has both a lot of expensive camping options (I can definitely attest to this) and really cheap (free) options.
Otherwise, the West (as expected) is generally pretty cheap camping, mostly thanks to amazing federal land and its boondocking splendor.
That’s all the exploration I’ve got time for now. Feel free to file an issue/pull request if you have ideas of what else to look into!